Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.283
Filtrar
1.
Sci Rep ; 14(1): 9275, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654130

RESUMO

Transcription factors (TFs) are crucial epigenetic regulators, which enable cells to dynamically adjust gene expression in response to environmental signals. Computational procedures like digital genomic footprinting on chromatin accessibility assays such as ATACseq can be used to identify bound TFs in a genome-wide scale. This method utilizes short regions of low accessibility signals due to steric hindrance of DNA bound proteins, called footprints (FPs), which are combined with motif databases for TF identification. However, while over 1600 TFs have been described in the human genome, only ~ 700 of these have a known binding motif. Thus, a substantial number of FPs without overlap to a known DNA motif are normally discarded from FP analysis. In addition, the FP method is restricted to organisms with a substantial number of known TF motifs. Here we present DENIS (DE Novo motIf diScovery), a framework to generate and systematically investigate the potential of de novo TF motif discovery from FPs. DENIS includes functionality (1) to isolate FPs without binding motifs, (2) to perform de novo motif generation and (3) to characterize novel motifs. Here, we show that the framework rediscovers artificially removed TF motifs, quantifies de novo motif usage during an early embryonic development example dataset, and is able to analyze and uncover TF activity in organisms lacking canonical motifs. The latter task is exemplified by an investigation of a scATAC-seq dataset in zebrafish which covers different cell types during hematopoiesis.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Motivos de Nucleotídeos , Fatores de Transcrição , Peixe-Zebra , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Animais , Peixe-Zebra/genética , Peixe-Zebra/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Humanos , Sítios de Ligação , Ligação Proteica , Pegada de DNA/métodos , Biologia Computacional/métodos , Cromatina/metabolismo , Cromatina/genética
2.
J Chem Inf Model ; 64(8): 3237-3247, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38600752

RESUMO

Popular RNA-guided DNA endonuclease Cas9 from Streptococcus pyogenes (SpCas9) recognizes the canonical 5'-NGG-3' protospacer adjacent motif (PAM) and triggers double-stranded DNA cleavage activity. Mutations in SpCas9 were demonstrated to expand the PAM readability and hold promise for therapeutic and genome editing applications. However, the energetics of the PAM recognition and its relation to the atomic structure remain unknown. Using the X-ray structure (precatalytic SpCas9:sgRNA:dsDNA) as a template, we calculated the change in the PAM binding affinity in response to SpCas9 mutations using computer simulations. The E1219V mutation in SpCas9 fine-tunes the water accessibility in the PAM binding pocket and promotes new interactions in the SpCas9:noncanonical T-rich PAM, thus weakening the PAM stringency. The nucleotide-specific interaction of two arginine residues (i.e., R1333 and R1335 of SpCas9) ensured stringent 5'-NGG-3' PAM recognition. R1335A substitution (SpCas9R1335A) completely disrupts the direct interaction between SpCas9 and PAM sequences (canonical or noncanonical), accounting for the loss of editing activity. Interestingly, the double mutant (SpCas9R1335A,E1219V) boosts DNA binding affinity by favoring protein:PAM electrostatic contact in a desolvated pocket. The underlying thermodynamics explain the varied DNA cleavage activity of SpCas9 variants. A direct link between the energetics, structures, and activity is highlighted, which can aid in the rational design of improved SpCas9-based genome editing tools.


Assuntos
Proteína 9 Associada à CRISPR , Mutação , Streptococcus pyogenes , Streptococcus pyogenes/enzimologia , Proteína 9 Associada à CRISPR/metabolismo , Proteína 9 Associada à CRISPR/química , Proteína 9 Associada à CRISPR/genética , Simulação de Dinâmica Molecular , Motivos de Nucleotídeos , DNA/metabolismo , DNA/química , Conformação Proteica , Modelos Moleculares , Termodinâmica , Ligação Proteica
3.
BMC Bioinformatics ; 25(1): 128, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38528492

RESUMO

BACKGROUND: Discovery biological motifs plays a fundamental role in understanding regulatory mechanisms. Computationally, they can be efficiently represented as kmers, making the counting of these elements a critical aspect for ensuring not only the accuracy but also the efficiency of the analytical process. This is particularly useful in scenarios involving large data volumes, such as those generated by the ChIP-seq protocol. Against this backdrop, we introduce BIOMAPP::CHIP, a tool specifically designed to optimize the discovery of biological motifs in large data volumes. RESULTS: We conducted a comprehensive set of comparative tests with state-of-the-art algorithms. Our analyses revealed that BIOMAPP::CHIP outperforms existing approaches in various metrics, excelling both in terms of performance and accuracy. The tests demonstrated a higher detection rate of significant motifs and also greater agility in the execution of the algorithm. Furthermore, the SMT component played a vital role in the system's efficiency, proving to be both agile and accurate in kmer counting, which in turn improved the overall efficacy of our tool. CONCLUSION: BIOMAPP::CHIP represent real advancements in the discovery of biological motifs, particularly in large data volume scenarios, offering a relevant alternative for the analysis of ChIP-seq data and have the potential to boost future research in the field. This software can be found at the following address: (https://github.com/jadermcg/biomapp-chip).


Assuntos
Algoritmos , Software , Análise de Sequência de DNA/métodos , Imunoprecipitação da Cromatina/métodos , Sítios de Ligação , Motivos de Nucleotídeos
4.
J Chem Phys ; 160(11)2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38506297

RESUMO

Activator protein-1 (AP-1) comprises one of the largest and most evolutionary conserved families of ubiquitous eukaryotic transcription factors that act as a pioneer factor. Diversity in DNA binding interaction of AP-1 through a conserved basic-zipper (bZIP) domain directs in-depth understanding of how AP-1 achieves its DNA binding selectivity and consequently gene regulation specificity. Here, we address the structural and dynamical aspects of the DNA target recognition process of AP-1 using microsecond-long atomistic simulations based on the structure of the human AP-1 FosB/JunD bZIP-DNA complex. Our results show the unique role of DNA shape features in selective base specific interactions, characteristic ion population, and solvation properties of DNA grooves to form the motif sequence specific AP-1-DNA complex. The TpG step at the two terminals of the AP-1 site plays an important role in the structural adjustment of DNA by modifying the helical twist in the AP-1 bound state. We addressed the role of intrinsic motion of the bZIP domain in terms of opening and closing gripper motions of DNA binding helices, in target site recognition and binding of AP-1 factors. Our observations suggest that binding to the cognate motif in DNA is mainly accompanied with the precise adjustment of closing gripper motion of DNA binding helices of the bZIP domain.


Assuntos
DNA , Fator de Transcrição AP-1 , Humanos , Fator de Transcrição AP-1/metabolismo , Motivos de Nucleotídeos , DNA/química , Sítios de Ligação , Ligação Proteica
5.
Proc Natl Acad Sci U S A ; 121(11): e2309469121, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38442181

RESUMO

The early-life environment can profoundly shape the trajectory of an animal's life, even years or decades later. One mechanism proposed to contribute to these early-life effects is DNA methylation. However, the frequency and functional importance of DNA methylation in shaping early-life effects on adult outcomes is poorly understood, especially in natural populations. Here, we integrate prospectively collected data on fitness-associated variation in the early environment with DNA methylation estimates at 477,270 CpG sites in 256 wild baboons. We find highly heterogeneous relationships between the early-life environment and DNA methylation in adulthood: aspects of the environment linked to resource limitation (e.g., low-quality habitat, early-life drought) are associated with many more CpG sites than other types of environmental stressors (e.g., low maternal social status). Sites associated with early resource limitation are enriched in gene bodies and putative enhancers, suggesting they are functionally relevant. Indeed, by deploying a baboon-specific, massively parallel reporter assay, we show that a subset of windows containing these sites are capable of regulatory activity, and that, for 88% of early drought-associated sites in these regulatory windows, enhancer activity is DNA methylation-dependent. Together, our results support the idea that DNA methylation patterns contain a persistent signature of the early-life environment. However, they also indicate that not all environmental exposures leave an equivalent mark and suggest that socioenvironmental variation at the time of sampling is more likely to be functionally important. Thus, multiple mechanisms must converge to explain early-life effects on fitness-related traits.


Assuntos
Experiências Adversas da Infância , Metilação de DNA , Animais , Motivos de Nucleotídeos , Bioensaio , Papio/genética
6.
Methods Enzymol ; 695: 233-254, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38521587

RESUMO

i-Motifs are non-canonical secondary structures of DNA formed by mutual intercalation of hemi-protonated cytosine-cytosine base pairs, most typically in slightly acidic conditions (pH<7.0). These structures are well-studied in vitro and have recently been suggested to exist in cells. Despite nearly a decade of active research, the quest for small-molecule ligands that could selectively bind to and stabilize i-motifs continues, and no reference, bona fide i-motif ligand is currently available. This is, at least in part, due to the lack of robust methods to assess the interaction of ligands with i-motifs, since many techniques well-established for studies of other secondary structures (such as CD-, UV-, and FRET-melting) may generate artifacts when applied to i-motifs. Here, we describe an implementation of automated, potentiometric (pH) titrations as a robust isothermal method to assess the impact of ligands or cosolutes on thermodynamic stability of i-motifs. This approach is validated through the use of a cosolute previously known to stabilize i-motifs (PEG2000) and three small-molecule ligands that are able to stabilize, destabilize, or have no effect on the stability of i-motifs, respectively.


Assuntos
Citosina , DNA , Ligantes , Motivos de Nucleotídeos , Pareamento de Bases , DNA/química , Citosina/química
7.
Nat Commun ; 15(1): 1915, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429336

RESUMO

Artificial biomolecular condensates are emerging as a versatile approach to organize molecular targets and reactions without the need for lipid membranes. Here we ask whether the temporal response of artificial condensates can be controlled via designed chemical reactions. We address this general question by considering a model problem in which a phase separating component participates in reactions that dynamically activate or deactivate its ability to self-attract. Through a theoretical model we illustrate the transient and equilibrium effects of reactions, linking condensate response and reaction parameters. We experimentally realize our model problem using star-shaped DNA motifs known as nanostars to generate condensates, and we take advantage of strand invasion and displacement reactions to kinetically control the capacity of nanostars to interact. We demonstrate reversible dissolution and growth of DNA condensates in the presence of specific DNA inputs, and we characterize the role of toehold domains, nanostar size, and nanostar valency. Our results will support the development of artificial biomolecular condensates that can adapt to environmental changes with prescribed temporal dynamics.


Assuntos
Condensados Biomoleculares , Empacotamento do DNA , Replicação do DNA , Conversão Gênica , Motivos de Nucleotídeos
8.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38317052

RESUMO

MOTIVATION: Accurate prediction of RNA subcellular localization plays an important role in understanding cellular processes and functions. Although post-transcriptional processes are governed by trans-acting RNA binding proteins (RBPs) through interaction with cis-regulatory RNA motifs, current methods do not incorporate RBP-binding information. RESULTS: In this article, we propose DeepLocRNA, an interpretable deep-learning model that leverages a pre-trained multi-task RBP-binding prediction model to predict the subcellular localization of RNA molecules via fine-tuning. We constructed DeepLocRNA using a comprehensive dataset with variant RNA types and evaluated it on the held-out dataset. Our model achieved state-of-the-art performance in predicting RNA subcellular localization in mRNA and miRNA. It has also demonstrated great generalization capabilities, performing well on both human and mouse RNA. Additionally, a motif analysis was performed to enhance the interpretability of the model, highlighting signal factors that contributed to the predictions. The proposed model provides general and powerful prediction abilities for different RNA types and species, offering valuable insights into the localization patterns of RNA molecules and contributing to our understanding of cellular processes at the molecular level. A user-friendly web server is available at: https://biolib.com/KU/DeepLocRNA/.


Assuntos
Aprendizado Profundo , Animais , Humanos , Camundongos , RNA/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Motivos de Nucleotídeos , Proteínas de Ligação a RNA/metabolismo , Biologia Computacional/métodos
9.
Nucleic Acids Res ; 52(5): 2188-2197, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38364855

RESUMO

i-Motifs (iMs), are secondary structures formed in cytosine-rich DNA sequences and are involved in multiple functions in the genome. Although putative iM forming sequences are widely distributed in the human genome, the folding status and strength of putative iMs vary dramatically. Much previous research on iM has focused on assessing the iM folding properties using biophysical experiments. However, there are no dedicated computational tools for predicting the folding status and strength of iM structures. Here, we introduce a machine learning pipeline, iM-Seeker, to predict both folding status and structural stability of DNA iMs. The programme iM-Seeker incorporates a Balanced Random Forest classifier trained on genome-wide iMab antibody-based CUT&Tag sequencing data to predict the folding status and an Extreme Gradient Boosting regressor to estimate the folding strength according to both literature biophysical data and our in-house biophysical experiments. iM-Seeker predicts DNA iM folding status with a classification accuracy of 81% and estimates the folding strength with coefficient of determination (R2) of 0.642 on the test set. Model interpretation confirms that the nucleotide composition of the C-rich sequence significantly affects iM stability, with a positive correlation with sequences containing cytosine and thymine and a negative correlation with guanine and adenine.


Assuntos
DNA , Aprendizado de Máquina , Motivos de Nucleotídeos , Humanos , Sequência de Bases , Citosina/química , DNA/química , DNA/genética
10.
Adv Sci (Weinh) ; 11(12): e2304519, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38227373

RESUMO

The regulation of gene expression by light enables the versatile, spatiotemporal manipulation of biological function in bacterial and mammalian cells. Optoribogenetics extends this principle by molecular RNA devices acting on the RNA level whose functions are controlled by the photoinduced interaction of a light-oxygen-voltage photoreceptor with cognate RNA aptamers. Here light-responsive ribozymes, denoted optozymes, which undergo light-dependent self-cleavage and thereby control gene expression are described. This approach transcends existing aptamer-ribozyme chimera strategies that predominantly rely on aptamers binding to small molecules. The optozyme method thus stands to enable the graded, non-invasive, and spatiotemporally resolved control of gene expression. Optozymes are found efficient in bacteria and mammalian cells and usher in hitherto inaccessible optoribogenetic modalities with broad applicability in synthetic and systems biology.


Assuntos
RNA Catalítico , RNA , Animais , Motivos de Nucleotídeos , RNA/genética , RNA Catalítico/química , RNA Catalítico/genética , RNA Catalítico/metabolismo , Bactérias/metabolismo , Expressão Gênica , Mamíferos/metabolismo
11.
Nucleic Acids Res ; 52(4): e20, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38214231

RESUMO

Numerous statistical methods have emerged for inferring DNA motifs for transcription factors (TFs) from genomic regions. However, the process of selecting informative regions for motif inference remains understudied. Current approaches select regions with strong ChIP-seq signal for a given TF, assuming that such strong signal primarily results from specific interactions between the TF and its motif. Additionally, these selection approaches do not account for non-target motifs, i.e. motifs of other TFs; they presume the occurrence of these non-target motifs infrequent compared to that of the target motif, and thus assume these have minimal interference with the identification of the target. Leveraging extensive ChIP-seq datasets, we introduced the concept of TF signal 'crowdedness', referred to as C-score, for each genomic region. The C-score helps in highlighting TF signals arising from non-specific interactions. Moreover, by considering the C-score (and adjusting for the length of genomic regions), we can effectively mitigate interference of non-target motifs. Using these tools, we find that in many instances, strong ChIP-seq signal stems mainly from non-specific interactions, and the occurrence of non-target motifs significantly impacts the accurate inference of the target motif. Prioritizing genomic regions with reduced crowdedness and short length markedly improves motif inference. This 'less-is-more' effect suggests that ChIP-seq region selection warrants more attention.


Assuntos
Genômica , Motivos de Nucleotídeos , Fatores de Transcrição , Sítios de Ligação , Imunoprecipitação da Cromatina , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
12.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38230755

RESUMO

MOTIVATION: The prediction of RNA structure canonical base pairs from a single sequence, especially pseudoknotted ones, remains challenging in a thermodynamic models that approximates the energy of the local 3D motifs joining canonical stems. It has become more and more apparent in recent years that the structural motifs in the loops, composed of noncanonical interactions, are essential for the final shape of the molecule enabling its multiple functions. Our capacity to predict accurate 3D structures is also limited when it comes to the organization of the large intricate network of interactions that form inside those loops. RESULTS: We previously developed the integer programming framework RNA Motifs over Integer Programming (RNAMoIP) to reconcile RNA secondary structure and local 3D motif information available in databases. We further develop our model to now simultaneously predict the canonical base pairs (with pseudoknots) from base pair probability matrices with or without alignment. We benchmarked our new method over the all nonredundant RNAs below 150 nucleotides. We show that the joined prediction of canonical base pairs structure and local conserved motifs (i) improves the ratio of well-predicted interactions in the secondary structure, (ii) predicts well canonical and Wobble pairs at the location where motifs are inserted, (iii) is greatly improved with evolutionary information, and (iv) noncanonical motifs at kink-turn locations. AVAILABILITY AND IMPLEMENTATION: The source code of the framework is available at https://gitlab.info.uqam.ca/cbe/RNAMoIP and an interactive web server at https://rnamoip.cbe.uqam.ca/.


Assuntos
Algoritmos , RNA , RNA/química , Conformação de Ácido Nucleico , Software , Motivos de Nucleotídeos
13.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38291894

RESUMO

MOTIVATION: Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. RESULTS: We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES' usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. AVAILABILITY AND IMPLEMENTATION: PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).


Assuntos
Perfumes , Humanos , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , Pareamento de Bases , RNA/química
14.
Bioorg Med Chem ; 98: 117580, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38194737

RESUMO

We here report a new molecule DoNA binding to a CAG repeat RNA. DoNA is a dimer of the NA molecule that we previously reported. NA binds with high affinity to a CAG repeat DNA but not significantly to a CAG repeat RNA. Binding analyses using SPR and CSI-TOF MS indicated a significant increase in the affinity of DoNA to a single stranded CAG repeat RNA compared to NA. Systematic investigation of the RNA motifs bound by DoNA using hairpin RNA models revealed that DoNA binds to the CAG units at overhang and terminal positions, and notably, it binds to the structurally flexible internal and hairpin loop region.


Assuntos
RNA , Repetições de Trinucleotídeos , RNA/química , DNA/química , Motivos de Nucleotídeos
15.
Int J Mol Sci ; 25(1)2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38203852

RESUMO

Circular RNAs (circRNAs) are a recently characterized family of gene transcripts forming a covalently closed loop of single-stranded RNA. The extent of their potential for fine-tuning gene expression is still being discovered. Several studies have implicated certain circular RNAs in pathophysiological processes within vascular endothelial cells and cancer cells independently. However, to date, no comparative study of circular RNA expression in different types of endothelial cells has been performed and analysed through the lens of their central role in vascular physiology and pathology. In this work, we analysed publicly available and original RNA sequencing datasets from arterial, veinous, and lymphatic endothelial cells to identify common and distinct circRNA expression profiles. We identified 4713 distinct circRNAs in the compared endothelial cell types, 95% of which originated from exons. Interestingly, the results show that the expression profile of circular RNAs is much more specific to each cell type than linear RNAs, and therefore appears to be more suitable for distinguishing between them. As a result, we have discovered a specific circRNA signature for each given endothelial cell type. Furthermore, we identified a specific endothelial cell circRNA signature that is composed four circRNAs: circCARD6, circPLXNA2, circCASC15 and circEPHB4. These circular RNAs are produced by genes that are related to endothelial cell migration pathways and cancer progression. More detailed studies of their functions could lead to a better understanding of the mechanisms involved in physiological and pathological (lymph)angiogenesis and might open new ways to tackle tumour spread through the vascular system.


Assuntos
Células Endoteliais , RNA Circular , RNA Circular/genética , Motivos de Nucleotídeos , RNA/genética , Movimento Celular
16.
Biochem Biophys Res Commun ; 691: 149327, 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38039839

RESUMO

Although structures of many RNA loops, such as GNRA and UNCG tetraloops, were well known, it is still possible to find more RNA structures. In the present study, solution structure of an RNA fragment having UUCGA pentaloop was analyzed by NMR spectroscopy. It was found that the UUCG tetraloop is formed and the adenosine residue at the 3' side of the tetraloop is bulged out. The characteristic motif of the loop-bulge structure has also been found in other RNAs including CUUGU and CUGGC pentaloops. Along with the recently found T-hairpin structure with a UUUGAUU loop, in which UUUGA pentaloop and UU bulge are formed, the loop-bulge structures can be categorized as an RNA motif and it may be called as the integrated structure loop, I-loop.


Assuntos
RNA , Conformação de Ácido Nucleico , RNA/química , Motivos de Nucleotídeos , Espectroscopia de Ressonância Magnética
17.
Comput Biol Med ; 168: 107753, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38039889

RESUMO

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.


Assuntos
Algoritmos , Software , Motivos de Nucleotídeos/genética , Reprodutibilidade dos Testes , Cromatina/genética
18.
Nucleic Acids Res ; 52(D1): D154-D163, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971293

RESUMO

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Domínios e Motivos de Interação entre Proteínas , Fatores de Transcrição , Animais , Humanos , Camundongos , Sítios de Ligação/genética , Motivos de Nucleotídeos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Internet , Domínios e Motivos de Interação entre Proteínas/genética
19.
Nucleic Acids Res ; 52(D1): D222-D228, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37850642

RESUMO

MethMotif (https://methmotif.org) is a publicly available database that provides a comprehensive repository of transcription factor (TF)-binding profiles, enriched with DNA methylation patterns. In this release, we have enhanced the platform, expanding our initial collection to over 700 position weight matrices (PWM), all of which include DNA methylation profiles. One of the key advancements in this release is the segregation of TF-binding motifs based on their cofactors and DNA methylation status. We have previously demonstrated that gene ontology (GO) enriched terms associated with TF target genes may differ based on their association with alternative cofactors and DNA methylation status. MethMotif provides precomputed GO annotations for each human TF of interest, as well as for TF-co-TF complexes, enabling a comprehensive analysis of TF functions in the context of their co-factors. Additionally, MethMotif has been updated to encompass data for two new species, Mus musculus and Arabidopsis thaliana, widening its applicability to a broader community. MethMotif stands out as the first and only TF-binding motifs database to incorporate context-specific PWM coupled with epigenetic information, thereby enlightening context-specific TF functions. This enhancement allows the community to explore and gain deeper insights into the regulatory mechanisms governing transcriptional processes.


Assuntos
Metilação de DNA , Bases de Dados Genéticas , Fatores de Transcrição , Animais , Humanos , Camundongos , Sítios de Ligação , Anotação de Sequência Molecular , Motivos de Nucleotídeos , Ligação Proteica , Fatores de Transcrição/metabolismo
20.
Nucleic Acids Res ; 52(D1): D311-D321, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37602392

RESUMO

Discoveries over the recent decade have demonstrated the unexpected diversity of telomere DNA motifs in nature. However, currently available resources, 'Telomerase database' and 'Plant rDNA database', contain just fragments of all relevant literature published over decades of telomere research as they have a different primary focus and limited updates. To fill this gap, we gathered data about telomere DNA sequences from a thorough literature screen as well as by analysing publicly available NGS data, and we created TeloBase (http://cfb.ceitec.muni.cz/telobase/) as a comprehensive database of information about telomere motif diversity. TeloBase is supplemented by internal taxonomy utilizing popular on-line taxonomic resources that enables in-house data filtration and graphical visualisation of telomere DNA evolutionary dynamics in the form of heat tree plots. TeloBase avoids overreliance on administrators for future data updates by having a simple form and community-curation system for application and approval, respectively, of new telomere sequences by users, which should ensure timeliness of the database and topicality. To demonstrate TeloBase utility, we examined telomere motif diversity in species from the fungal genus Aspergillus, and discovered (TTTATTAGGG)n sequence as a putative telomere motif in the plant family Chrysobalanaceae. This was bioinformatically confirmed by analysing template regions of identified telomerase RNAs.


Assuntos
Bases de Dados Genéticas , Telomerase , Motivos de Nucleotídeos , Plantas/genética , Telomerase/genética , Telômero/genética , Telômero/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...